{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Classification (1) – an issue with distance measures, and an implementation of Nearest Neighbour classification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook we will expand on some of the concepts of \n",
    "classification, starting with an experiment with distance measures on data, then looking into the $k$-Nearest Neighbour algorithm. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1) Distance measures for high-dimensionality data\n",
    "\n",
    "Algorithms such as $k$-Nearest Neighbour are conceptually very simple -- we predict the class value of an unlabelled *query* data point we are given by looking at all the labelled data point(s) in our data set, and predicting that our query will have the same class as the most similar data point(s) in the training set. So, all we need is a way of measuring similarity. The well-known *Euclidean distance measure* would seem to be a good choice. However, while we are very familiar with Euclidean distance in 2 and 3-dimensions, there was a warning (Slide 62 of the \"Classification (1)\" lecture) that in high-dimensions there is a problem – what was this problem ? "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pairwise distances in high-dimensional spaces \n",
    "\n",
    "**Answer**: in high-dimensional spaces everything is far away from everything else, and so pairwise distances become uninformative."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But what does this actually mean ? There is a mathematical argument to show that this is a true statement, but an alternative approach is simply to simulate what happens. One approach is to randomly generate $N$ points inside a $d$-dimensional cube centred around zero, such as $[-0.5, 0.5]^{d}$. Now we calculate the pairwise distances among the $N$ points.  After that for every data point we calculate the ratio of the minimum distance to the maximum distance  to all of the other data points. The mean ratio represents the average range of pairwise distances there are in that dimensionality. We run the simulation from 1 dimension to 1000 dimensions and the ratios will be plotted on a line chart using the ``` matplotlib ``` library. \n",
    "\n",
    "You should use the ```numpy``` library for this, and in particular the linear algebra methods to calculate distances such as the [L2 norm](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linalg.norm.html#numpy.linalg.norm). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEWCAYAAABi5jCmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzt3Xl8XHW9//HXu2nTNG2T7vuSQktt2SEtsiqIgoiAigKKsilwFRR31HsV0HvdFUF+FxFLWUTAhWtBFBChyCJNCpWlbIWm+550T5sm+fz++H7TTsIkOS2dTGbm83w85jEzZ5vPd+bM+Zxzvud8vzIznHPOuRY9sh2Ac8657sUTg3POuVY8MTjnnGvFE4NzzrlWPDE455xrxRODc865VvIiMUj6nqS1klbG9x+StETSZkmHZjGuDuOIw/fJRmy7S9Kxkl7tos+6StId8fW4+D0VdcVnJyHpRkn/1QWfUyPpxPj6m5JuzvRndqW9VaauWjclmaSJ8XWXrAPZoly4j0FSDTAcaEoZPNPMLpM0FngNGG9mq+P0bwBfMrM/v83PNWCSmS3Yw/n3ShyFRtJVwEQzOzfbsWRTXO8/bWZ/bzO8AlgI9DKzxq6PrDDtyfagvd+wu+uZ7QB2wwfb+XLHA+takkLKsJe6JqwOdXkcknr6xsLlOl+Ps8zMuv0DqAFOTDP8RKAeaAY2A7+LzwZsAd6I040C/gisIexpfT5lGUXAN4E3gE3AXGAs8HjKcjYDZ6X5/B7AfwKLgNXAbUA50DtdHGnmN8KeMcBM4AbgLzGOZ4B9U6bdH3gYqAVWAd+Mw68C/gDcAWwEPh3jujKWaR1wDzAoZVm/B1YCG2I5908ZdwowP8awDPhKHP5uYGmb3+QrwPNxOXcDJSnjvwasAJbHmHaWNc33MAGYHT/zYeCXwB1xXEWct2d8/xjwPeCp+B3fBwwGfhvLXwVUpCz7HSnf26vAx1LGtfudAwJ+Hn/XDbGcB6TM972U5XwGWBA/YxYwqs1vfCnwOlAXP6/lSH1f4B/xN1obyzAg3Xoff+eW72RxXO7m+HhX/OwDU+YdRvhvDE263rb5vs+Ln7MW+FYH/82ZwI3xO94Uf8fxKeN/ASyJv81c4NiUcallavnci+LnPg7cCnw5jh8dx382vp8Yyyzeum5+nbDuboq/+XtSyt3u/yJN2b7KrnX4Qt76f/1efD0EuB9YH2P6Z/ys2wnbpvr4O30twf9vJnu2HWi3bEAJYfuwLsZYBQzvcJubzQ1+0gftJIZ0G6yUP2PLD9gjrpDfBoqBfYA3gZNSfvwXgMlxJTsYGNx2Oe189oWEDcI+QD/gT8Dt6eJoZ/62K1otMJ1wJPdb4K44rn9cQb8cf+T+wBEpf64dwBmxrH2AK4B/AWMISepXwO/axN0/jrsWmJcybgXxzwsMBA5L9z3H32QOIekOAl4GLo3jTo4r/v5AKeEP0lFieBr4WYznOMIfoqPEsICwUS0nJLHXCDsJPQkbuVvitH0JG6UL4rjDCBu6/RN85ycR1psBcb2YAoxMs1E4IS7zsBj/9cDjbX7j++NyxhF2Tk5O2bi9N843lLCRuDbdek/6jWjPlGn/H/DDlPdfAO7b3fU2Zdm/JqxLBwPbgSntLGtm/L2Oi+X4BfBEyvhzCYm7J2H9XUncgWinTLfF361PjPO+OP7jhI3e3Sll+HPbdZPwP15CTM5xuS3JvsP/RZtynUzY8B4Q47mT9hPD9wnJsVd8HMuu5L/zN0z4/5vJnm0H2i0bcAlhB6qUsCN8OFDW4TY3kxv0vfWIX+5mQrZreXwm3QYrzQb3CGBxm/HfYNfG41Xg9HY+t7MN+yPEPZiUlXIHuzZiu5sYbk4ZdwrwSnx9DvBcO8u4ipQNURz2MnEvKb4fmRpXm2kHxDha9hgXxxWprM10rb7n+Jucm/L+R8CN8fUM4Psp4ya2910QNpaNQN+UYXfScWL4Vsq0PwX+mvL+g8Q/GnAW8M82n/cr4DsJvvMTCAnnnUCPNsuYya6Nwm+AH6WM6xe/64qU3/iYlPH3AFe281uekfo7s3uJ4QjCBrFHfF9NytFR0vU2ZdljUsbPAc5uZ1kziRuulPI3AWPbmb4OOLiDMu2TMu2+hP96D8KG9xJ2JYBbCfV30DoxTCQcBZ1IqIPZ0//FDOAHKe/3o/3EcA3wZ9Kv3zt/w3a+j7b/v5ns2Xag3bIREtFTwEHtxdH2kUtXJZ1hZgNSHr9OON94YJSk9S0Pwqmj4XH8WMKeyJ4YRTgcb7GI8EMMTz95p1amvN5K+JNB5zEuafN+PHBvSnlfJvxZh0sqkvQDSW9I2khYcSEcDgN8hLAyLpI0W9KRexDvqDYxtY0v1Sigzsy2pAxb1N7E0aqU1/Vp3rfEMR44os1v/wlgRGdlMLN/EE5p3QCsknSTpLJ24t8Zr5ltJhyyj+7sMyQNk3SXpGXxt7iDXb/DbjGzZwinLd8l6R2EDeSsdiZPst6299ums/P3jeWvjZ+BpC9LelnShvj9l9NxGVOX9QZhh/AQwl74/cBySZMJp89mt53ZQsXwFYSkszp+v6Pi6Hb/F2niaLsOd7RO/phwBPaQpDclXdnehAn+f7Bn24GOynY78CBwl6Tlkn4kqVcH5cmpxLCnlgAL2ySV/mZ2Ssr4ffdw2csJP0iLlr3fVekn32OdxWhppn9/mzKXmNkywiH56YQ9qnLCnhqE0yWYWZWZnU44R/1/hD3c3bWCcEjbYmwn0w6U1Ddl2Lg9+Mx0lgCz23wP/czsP5LMbGbXmdnhhFNi+xFOO7bVah2I5RhMOMfdme8TfruDzKyMcNpFSUJrZ/itcRmfBP5gZtvamW5vr7c7f19J/QinFpdLOpZwvv9jwEAzG0A4r95RGduWbTZwJlAc19/ZwKcIpznnpV2A2Z1mdgyhjAb8MI7q6H/R1gpar7ftrpNmtsnMvmxm+xCOWL8k6T3tlKfD/18nOtoOtFs2M9thZleb2VTgKOBUwnfYrkJIDHOAjZK+LqlPzNgHSJoWx98MfFfSJAUHSRocx60inIdtz++AL0qaEP8Q/0M4B7q3r6a4Hxgh6QpJvSX1l3REB9PfCPy3pPEAkoZKOj2O6084Z7yOcM7xf1pmklQs6ROSys1sB6HCsInddw9wgaQpkkoJ9TtpmdkiwmmPq+PnH0P4c+0N9wP7SfqkpF7xMU3SlM5mjNMdEfestgDbSP9d3Eko6yGSehO+z2fMrCZBfP2Jp0gljSZ94klnDaFSs+26eTvwIUJyuK2D+ff2enuKpGMkFQPfJZR/CaF8jTHenpK+DaQ76urIbOAyQv0LhFOJlxPqMd7ye0iaLOmE+FtsIxxBtkzX0f+irXuA8yVNjevwd9oLUNKpkiZKErv+My2f2XYb0u7/L4GOtgPtlk3S8ZIOVLgXaCPhFFOH/+tcSgz3Kdzo1PK4N8lMceX5IOFwdCGhovBmQraGUOl5D/AQ4Uv7DaHiC8Lh6K3x8OxjaRY/g/BnfDwuexthpd2rzGwToZLyg4TDzNeB4zuY5ReE0wgPSdpEqJRqWYFuIxwWLyNU3P6rzbyfBGriYe6lhI3M7sb7V+A64FHCIfbTcdT2dmb5eIyvlvAH7GijtjtxbALeB5xN2EteSdh77J1g9jJCBWwd4ftaB/wkzWc8AvwX4aq3FYQ9urMThng1odJ6A+EqlD8lmcnMtgL/DTwZ1813xuFLgWcJe6n/7GARe3u9vZPwu9USKjY/EYc/CPyVUFezKH5OR6cV05lN2Ji2JIYnCBvUx9uZvjfwA8L/fCXhyPebcVxH/4tW4jp8LeGqsQXxuT2TgL8TkvzTwP8zs8fiuO8D/xl/p6/Q+f+vXZ1sBzoq2wjClYsbCaeYZhNOW7YrJ25wc7kt7qG/CPTOwNGUSyFpBrDczP6ziz5vJqHit0s+z3WNXDpicDlEoTmQYkkDCXvp93lSyCyFO6I/TDjqdW6PeWJwmXIJ4dzyG4TzmYkqfN2ekfRdwlHZj81sYbbjcbnNTyU555xrxY8YnHPOtZJLjegBMGTIEKuoqMh2GM45l1Pmzp271syGJpk25xJDRUUF1dXV2Q7DOedyiqTOWhTYyU8lOeeca8UTg3POuVY8MTjnnGvFE4NzzrlWOk0MkvpK6hFf7yfptM6abHXOOZe7khwxPA6UxNYfHyH0hjUzk0E555zLniSJQbE1xw8D15vZh4CpmQ3LOedctiS5j0EKvXh9gtBRd9L5nHPO7aaGxmbWb22gdmsDtZvj85bwOOEdwzhozICMx5BkA38FoY/ke83sJUn7ENrZd8451wEzY9P2Ruq2NLBuS0Or59rUR8rGf9O29hshHtKvd/dIDGY2G5jd0vWimb0JfD7TgTnnXHezo6k5bNTT7M2ne9RtbWBHU/qGSot79mBw32IGlhYzuF8xYweWMqhvcbuPAX160bOoay4k7TQxxNNIvyF0SD1O0sHAJWb22UwH55xzmdKyN79zAx+f0+3Nt+zpd7Q3X96n186N+JiBpRw8ZgCD+hUzqDT9hr60uIjQG2j3k+RU0rXASYRu4zCzf0s6LsnCJZ1M6HKuCLjZzH7QZvw4QgfmA+I0V5rZA8nDd865YHtjE3VbduzcU3/LqZu48U8d19jc+d78oL679ubD+14M6tu79d58aS96ddHefFdIVIlsZkvaZLZOO4iPHU/fQOijdClQJWmWmc1Pmew/gXvM7H8lTQUeACoSxu6cy2NNzUbtlgbWbt7O2s3bWbc5vF6zeTtrNzVQu2U7tVt37NzD37w9/d68BAP69GJg37D3Pm5QKYeMHbBzoz6wtPgte/bdeW++KyRJDEskHQWYpGJC/cLLCeabDiyIdRJIugs4ndABdgsjdLoOUE7osN05l6caGptZtyVs2Fs2+Gs3p77elQBqtzSQboe+uKgHQ/oVM7hfbwb2LWbC4NK4Bx82/qnn7QeWFlPehefm80WSxHAp4XTQaMKe/0PA5xLMNxpYkvJ+KXBEm2muAh6SdDnQFzgx3YIkXQxcDDBu3LgEH+2c6yr1DU079+RbNuprN+3a6Ifh4fWG+h1pl1FaXMSQfr0Z0i/s0R86biBD+xUzpH/vOLz3zmRQVtKzoPfmu0KSq5LWEu5h2F3pfrm2+f8cYKaZ/TRWct8u6QAza24Tw03ATQCVlZXeF6lzXWRHUzMrN2xj+fp6lm+oZ1ldPcvWb2PZ+nqWr69nxfp6tjSkP7NcVtIzbNj79mbyiP4c3a/1Rr5l3JD+xZQW+61R3UmSq5JuBb5gZuvj+4HAT83swk5mXQqMTXk/hreeKroIOBnAzJ6WVAIMAVYnC98593Zs3t7IsrqwkV8aN/Yt75etr2fVxm1vOZ0zuG8xowf2YeLQfhw7aQhDUzbwLRv+wf2K6d2zKDuFcm9bkjR9UEtSADCzOkmHJpivCpgkaQKwDDgb+HibaRYD7wFmSpoClABrEkXunOtQc7OxdvP2tBv8Zeu3saxuKxvbXH7Zs4cYOaCE0QP6cOS+gxkzoA+jBvRh9MD4PKAPJb18g5/vkiSGHpIGmlkdgKRBSeYzs0ZJlwEPEi5FnRHvnL4GqDazWcCXgV9L+iLhNNP5ZuanipxLoLGpmRUbtrG4ditL67bGjf2ujf+KDfVvubmqf0lPRscN/LSKgTs39i3PQ/v3pqiHn78vdEkSw0+BpyT9Ib7/KPDfSRYe70l4oM2wb6e8ng8cnSxU5wrP5u2NLF63lcW1W1hcu5VF67ayuDY8ltXVt7oOX4Lh/UsYPbAPB48dwCkHjmT0gJKde/ujBvShrMRbzHedS7Lnf5ukucDxhArlD7e5F8E5t4eam41Vm7axeN1WFtVuZUmbjX/tloZW0w8o7cW4QaUcOLqcDxw4kvGDSxk7qJSxA0sZUV6SVzdZuexJeinAK0Bdy/SSxpnZ4oxF5VweqW9oYknd1jYb/3AEsKSunobGXRfhFfUQowaUMG5QKSftP4Jxg0oZP7iUcYNCAijv43v8LvOSXJV0OfAdYBXhjmcR6gMOymxozuWOTdt28NqqzSyu3bJzj79l73/1pu2tpu3XuyfjBpUyaVh/TpwynLEpG/9RA/r4Xr/LuiRHDF8AJpvZukwH41x3Z2Ysratn/oqNvLzzsYnFtVt3TiPBiLKw1/+u/YbuPN0zfnBfxg0qZWBpL79By3VriZrEADZkOhDnupttO5p4bdUmXl6xkfnLQwJ4eeXGnS1sSjBhcF8OHF3OxyrH8I4RZUwY2tcv6XQ5L0lieBN4TNJfgJ3HxGb2s4xF5VwXW71p266N/4qNzF+xkTfXbN55c1dpcRFTRpZx+iGjmDKyjKkjy5g8or/fsevyUpK1enF8FMeHczlrR1Mzb67ZsnPj33I6aO3mXVf/jB7Qhykj+3PKASOYMrKMKSPLGDeolB5+fb8rEEkuV726KwJxbm/bsHVHq43//BUbeX3VZhqawlVAxT17sN/wfhw/eVg4ChhVxpQRZZSX+pU/rrAluSppKPA1YH9CkxUAmNkJGYzLud3S3Gy8tHwjj7++hucW1/Hyik0sW1+/c/yQfsVMGVnGBUdX7DwK2GdoX78CyLk0kpxK+i1wN3AqoQnu8/D2jFw3sGbTdv75+hoef20N/3x9LevizWATh/Xj8PED+eSR42MS6M+w/iWdLM051yJJYhhsZr+R9AUzmw3MljQ704E511ZDYzPPLq7j8dfWMPu1Nby0fCMQWvs8dtIQ3jV5KMdMHMrQ/r2zHKlzuS1JYmjpWWOFpA8Qms4ek7mQnNtl8bqtzI5HBU8tWMuWhiZ69hCHjR/IV0+azHGThrL/qDKvGHZuL0qSGL4nqZzQEur1hK44r8hoVK5gbW1o5F9vrmP2q2t4/PW1LFy7BYAxA/twxqGjOW6/oRy172D6e2NwzmVMksRQZ2YbCDe5HQ8gyVtEdXuFmfHKyk07Tw9V19TR0NRMn15FvHOfQZx35HiO228oE4b09buFnesiSRLD9cBhCYY5l0jdlgb+uWAtj78WThG1tCX0jhH9Of/oCo6bNJTKioF+97BzWdJuYoh9MB8FDJX0pZRRZYSOd5xLpLGpmXlL1oejgtfX8vzS9ZhBeZ9eHDtpCMftN5TjJg1lRLlfOeRcd9DREUMx0C9O0z9l+EbgzEwG5XLfig31zH41nB56YsFaNm1rpIfgkLEDuOI9+3HcfkM4aMwA7y3MuW6o3cSQcmnqTDNbBCCpB9DPzDZ2VYAud5gZTyxYyy1P1vDoq6sxg5HlJZxywEiO228ox0wc4ncVO5cDktQxfF/SpYS+GOYC5ZJ+ZmY/zmxoLlfUNzRx73PLmPnUQl5btZkh/Yq5/IRJnHrQSCYN6+eVxs7lmCSJYaqZbZT0CUL/zV8nJAhPDAVu+fp6bnt6EXdVLWb91h3sP6qMn3z0YD548Eh69/RqKOdyVZLE0EtSL+AM4JdmtkOSdTaTy09mxrOL65jxRA1/e2klZsZJ+4/ggqMnMK1ioB8dOJcHkiSGXwE1wL+BxyWNJ1RAuwLS0NjMX15Yzi1P1vD80g2UlfTkomMm8KkjxzNmYGm2w3PO7UVJmt2+DrguZdAiScdnLiTXnazdvJ07n1nM7f9axJpN29l3aF++e8YBfOSw0d5JjXN5qqP7GM41szva3MOQyntwy2MvLd/ALU/WMGvechqamnn35KFccPQEjp04xNslci7PdbTL1zc+9+9gGpdHmpqNh+evZMaTNcxZWEtpcRFnTRvLeUdVMHFYv2yH55zrIh3dx/Cr+Ow9uOW5DfU7uKdqCbc+XcPSunpGD+jDt06ZwsemjaW8j9934Fyh6ehU0nXtjQMws8/v/XBcV3pjzWZmPlnDH59dytaGJqZPGMR/fmAKJ04ZTk/v2cy5gtXRqaS58floYCqhFzeAj6aMcznGzHj89bXMeGIhs19bQ3FRD047ZBQXHF3B/qPKsx2ec64b6OhU0q0Aks4HjjezHfH9jcBDXRKd22u2NjTyx2eXMfPJhbyxZgtD+/fmS+/dj48fMY4h/bzHM+fcLkmuNxxFqICuje/7xWEuByyt2xruTp6zmI3bGjloTDk/P+tgPnDgKIp7+uki59xbJUkMPwCek/RofP8u4KqMReT2iqZm44d/e4Wb//kmkjj5gBFceHQFh43zu5Odcx1LcoPbLZL+ChwRB11pZiszG5Z7O7Y2NPKFu+bx8PxVnDN9LJefMIlRA/pkOyznXI5IdOtqTAR/znAsbi9YvWkbn761mheXbeDq0/bnvKMqsh2Scy7HeJsGeeS1VZu44JYqarc0cNMnKzlx6vBsh+Scy0GeGPLEkwvWcukdcynpVcQ9lxzJgWP80lPn3J5JdFmKpGMkXRBfD5U0IbNhud3x++olnDdjDqPK+/B/nzvak4Jz7m3p9IhB0neASmAycAvQC7iDcOObyyIz42cPv8b1/1jAsZOGcMMnDqOsxJuwcM69PUlOJX0IOBR4FsDMlkvyhvWybHtjE1/7w/P8ed5yzp42lu+ecQC9vBkL59xekCQxNJiZtfTaJqlvZzO4zKrb0sAlt89lTk0tXz1pMp99975+b4Jzbq9Jsot5j6RfAQMkfQb4O/DrJAuXdLKkVyUtkHRlO9N8TNJ8SS9JujN56IVp0botfOR/n2Le0vVcd86hfO74iZ4UnHN7VZIb3H4i6b2E7jwnA982s4c7m09SEXAD8F5gKVAlaZaZzU+ZZhLwDeBoM6uTNGwPy1EQ5i6q5TO3zcXMuPPTR1BZMSjbITnn8lCSyue+wD/M7GFJk4HJknq1NKrXgenAAjN7My7nLuB0YH7KNJ8BbjCzOgAzW70nhSgEf3l+BV+8Zx6jyku45YLpTBjiZ/Scc5mR5FTS40BvSaMJp5EuAGYmmG80sCTl/dI4LNV+wH6SnpT0L0knp1uQpIslVUuqXrNmTYKPzh9mxo2z3+Bzdz7LQaPL+dNnj/ak4JzLqCSJQWa2FfgwcL2ZfYjQP0On86UZZm3e9wQmAe8GzgFuljTgLTOZ3WRmlWZWOXTo0AQfnR8am5r55r0v8oO/vsIHDx7FHZ8+gkF9i7MdlnMuzyW5KkmSjgQ+AVy0G/MtBcamvB8DLE8zzb/iaamFkl4lJIqqBMvPa5u27eBzdz7H46+t4XPH78uX3zuZHj28ktk5l3lJjhiuIFQQ32tmL0naB3i0k3kgbNwnSZogqRg4G5jVZpr/A44HkDSEcGrpzaTB56vl6+v56I1P8+SCtfzwIwfy1ZPe4UnBOddlklyVNBuYnfL+TaDT/p7NrFHSZcCDQBEwIyaWa4BqM5sVx71P0nygCfiqma3bs6LkhxeXbeCiW6vYur2JmRdM49hJhXPqzDnXPcis7Wn/OEK61syukHQfb60bwMxOy3Rw6VRWVlp1dXU2PjrjHn1lNZ+781kG9OnFjAum8Y4RZdkOyTmXJyTNNbPKJNN2dMRwe3z+ydsPyXXm9qdr+M6sl5g6qozfnDeN4WUl2Q7JOVeg2k0MZjY3viwiVBBv7ZqQCktzs/E/D7zMzU8s5MQpw/jF2YfSt7e3hu6cy54kW6DzgRslrQP+GR9PtNyU5vZcfUMTX7x7Hn97aSXnH1XBf506lSKvZHbOZVmSyudPAUgaBZxJaOZiVJJ5XfvWbNrOp2+r5vml6/n2qVO58Bjv4sI51z0kaRLjXOBY4EBgLfBLwlGD20MLVm/i/FuqWLt5Ozeeezgn7T8i2yE559xOSfb6rwXeAG4EHjWzmoxGlOeeemMtl94+l+KeRdx98ZEcPPYtN3o751xWdXqDm5kNAS4ESoD/ljRH0u2dzObS+OPcpZw3Yw7Dy0q497NHeVJwznVLSU4llQHjgPFABVAONGc2rPxiZlz799f5xSOvc9S+g/nfcw+nvI93wemc656SnEp6IuXxSzNbmtmQ8ouZceUfX+Du6iWcefgY/udDB1Lc07vgdM51Xx0mhtjZzkNm9pUuiifvLK2r5+7qJZx/VAXf+eBU723NOdftdbjramZNwKFdFEteql5UC8BZ08Z6UnDO5YQkp5LmSZoF/B7Y0jLQzP6UsajyyJyFdfQv6cl+w/tnOxTnnEskSWIYBKwDTkgZZoAnhgSqa2qpHD/Q72h2zuWMJHc+X9AVgeSjui0NvL56M2cc2rZHU+ec677aTQySvmZmP5J0Pemb3e60T4ZCV70oNCc1fcKgLEfinHPJdXTE8HJ8zs/OD7pAVU0txUU9OHB0ebZDcc65xDpqdvu++Hxr14WTX6pqajl4bDklvYqyHYpzziWW5M7nocDXgamEZjEAMLMT2p3JUd/QxAtLN/CZ4/bJdijOObdbktyC+1vCaaUJwNVADVCVwZjywrwl62lsNqZXeP2Ccy63JEkMg83sN8AOM5ttZhcC78xwXDmvqqYWCQ4bPzDboTjn3G5Jch/Djvi8QtIHgOXAmMyFlB+qamqZPLy/N5bnnMs5SRLD9ySVA18GrgfKgC9mNKoc19jUzLOL6vjwYZ4/nXO5J8kNbvfHlxuA4zMbTn54ZeUmtjQ0Mc3vX3DO5aBO6xgk7SPpPklrJa2W9GdJfqlNB+YsDA3nTavw+gXnXO5JUvl8J3APMAIYRWhM73eZDCrXVS+qZczAPows75PtUJxzbrclSQwys9vNrDE+7iBNExkuMDPmLKxjml+m6pzLUUkqnx+VdCVwFyEhnAX8RdIgADOrzWB8Oadm3VbWbt7uicE5l7OSJIaz4vMlbYZfSEgUXt+QoqrG6xecc7ktyVVJE7oikHxRtbCWgaW9mDisX7ZDcc65PeK90u9l1YvqOHz8IO/G0zmXszwx7EWrN21j4dotTJ/gp5Gcc7nLE8NeNLcmdMxT6RXPzrkcluQGt4vavC+S9J3MhZS75tTUUtKrBweM8o55nHO5K8kRw3skPSBppKQDgH8B/TMcV06qrqnjkLEDKO7pB2LOudyV5Kqkj0s6C3gB2AqcY2ZPZjyyHLN5eyMvLd/AZcdPzHYozjn3tiQ5lTQJ+ALwR0InPZ+UVJrhuHLOc4vraDavX3DO5b4k5zzuA/7LzC4B3gW8jvfg9hZVC2vp4R3zOOfyQJI7n6eb2UYAMzPgp5JmZTas3FNVU8fUUWWZKl70AAAWp0lEQVT0653kK3XOue4rSR3DxljpPBUoSRn1esaiyjENjc08t6SOc6aPy3Yozjn3tiWpY/gOoee26wkd9fwIOC3JwiWdLOlVSQtiQ3ztTXemJJNUmTDubuWl5RvYtqPZG85zzuWFJHUMZwLvAVaa2QXAwUDvzmaSVATcALyfcLRxjqSpaabrD3weeGY34u5WWhrOq/SG85xzeSBJYqg3s2agUVIZsJpkLapOBxaY2Ztm1kBotvv0NNN9l3AUsi1hzN1OVU0dFYNLGda/pPOJnXOum0uSGKolDQB+DcwFngXmJJhvNLAk5f3SOGwnSYcCY1P6lU5L0sWSqiVVr1mzJsFHd53mZqO6ptZPIznn8kaSyufPxpc3SvobUGZmzydYdrrmRXf2/CapB/Bz4PwEMdwE3ARQWVnZrXqPe3PtZuq27vDE4JzLG4murZR0EFDRMr2kiWb2p05mWwqMTXk/Blie8r4/cADwWGyiegQwS9JpZladKPpuYM7C0HDetAmeGJxz+aHTxCBpBnAQ8BLQHAcb0FliqAImSZoALAPOBj7eMtLMNgBDUj7nMeAruZQUAKprahnSr5iKwX4zuHMuPyQ5Yninmb3laqLOmFmjpMuAB4EiYIaZvSTpGqDazPLiJrk5sX7BO+ZxzuWLJInhaUlTzWz+7i7czB4AHmgz7NvtTPvu3V1+tq3YUM/SunouONp7P3XO5Y8kieFWQnJYCWwnVCqbmR2U0chyQFXsmGe6Vzw75/JIksQwA/gkodnt5k6mLSjVNbX0LS5iykjvnsI5lz+SJIbF+VIfsLfNWVjLYeMH0rPIO+ZxzuWPJInhFUl3Eprf3t4yMMHlqnltQ/0OXl21ifcfMDLboTjn3F6VJDH0ISSE96UMS3K5al57dlEdZjBtgreP5JzLL+0mBknnAA/FhvNcG1U1tfTsIQ4d64nBOZdfOjpiGA/8XlIv4BHgr8Cc2FlPwauqqeWA0eX0KS7KdijOObdXtVtramY/MLMTgFOAfwMXAs9KulPSpyQN76ogu5ttO5r495INTPNmtp1zeShJI3qbgHvjg9inwvuB24CTMhpdN/XCsg00NHnHPM65/JS0Eb3RhFNLLdNXmdlPMxZVNzdnYUvHPJ4YnHP5J0kjej8EzgLmA01xsAGPZzCubq26ppaJw/oxqG9xtkNxzrm9LskRwxnAZDPb3umUBaCp2aheVMepB43KdijOOZcRSW7ZfRPolelAcsVrqzaxaVujVzw75/JWkiOGrcA8SY/Q+s7nz2csqm6sqibUL3jFs3MuXyVJDLPiwxFaVB1RVsKYgX2yHYpzzmVEkstVb+2KQHKBmVG1sJZpE7xjHudc/uqoSYx7zOxjkl4gXIXUSiH2x7C0rp6VG7d5/YJzLq91dMTwhfh8alcEkgu8fsE5VwjaTQxmtiI+LwKQVNbR9IWgqqaO/iU92W+4d8zjnMtfSW5wuwS4Bqhn1yklA/bJYFzdUlVNLZXjB1LUw+sXnHP5K8kRwFeA/c1sbaaD6c5qtzSwYPVmPnTo6GyH4pxzGZXkBrc3CPcyFLTqWL8wfYLXLzjn8luSI4ZvAE9JeoYCvsGtelEdxUU9OHB0ebZDcc65jEqSGH4F/AN4AWjObDjd15yFtRw8tpySXt4xj3MuvyVJDI1m9qWMR9KN1Tc08eKyDXzmuIKrb3fOFaAkdQyPSrpY0khJg1oeGY+sG3luSR2NzcZ0v3/BOVcAkhwxfDw+fyNlWEFdrlpdU4cEh433O56dc/kvSVtJE7oikO6sqqaWycP7U97HWx93zuW/JKeSClpjUzPPLqrzZjCccwXDE0MnXl6xiS0NTUzz+xeccwWi3cQg6ej43Lvrwul+djWc5/ULzrnC0NERw3Xx+emuCKS7qqqpZczAPows9455nHOFoaPK5x2SbgFGS7qu7chCuPPZzKiqqePYSUOyHYpzznWZjhLDqcCJwAnA3K4Jp3upWbeVtZu3e8Wzc66gdNQfw1rgLkkvm9m/uzCmbsPrF5xzhSjJVUnrJN0rabWkVZL+KGlMxiPrBqoW1jKwtBcTh/XLdijOOddlkiSGW4BZwChgNHBfHJb3qhfVcfj4QUjeMY9zrnAkSQzDzOwWM2uMj5nA0AzHlXWrN21j4dotTJ/gp5Gcc4UlSWJYI+lcSUXxcS6wLtOBZVt1TR0AlV7x7JwrMEkSw4XAx4CVwArgzDisU5JOlvSqpAWSrkwz/kuS5kt6XtIjksbvTvCZVFVTS0mvHhwwyjvmcc4VliSN6C0GTtvdBUsqAm4A3gssBaokzTKz+SmTPQdUmtlWSf8B/Ag4a3c/KxOqamo5ZOwAint6qyHOucKSya3edGCBmb1pZg3AXcDpqROY2aNm1tKf9L+AbnG10+btjcxfvtH7X3DOFaRMJobRwJKU90vjsPZcBPw13YjYUVC1pOo1a9bsxRDTe3ZRHc3m9QvOucKUycSQ7hpPSzthqNCuBH6cbryZ3WRmlWZWOXRo5i+Iqq6ppYd3zOOcK1CdJgZJ5ZJ+3rLHLumnkpLUyC4Fxqa8HwMsT7P8E4FvAaeZ2fakgWfSnJpapo4qo1/vJB3cOedcfklyxDAD2Ei4Mulj8XWSG9yqgEmSJkgqBs4m3Ci3k6RDgV8RksLq3Qk8Uxoam5m3ZL23j+ScK1hJdon3NbOPpLy/WtK8zmYys0ZJlwEPAkXADDN7SdI1QLWZzSKcOuoH/D7eXbzYzHb7Cqi96cXlG9i2o9kTg3OuYCVJDPWSjjGzJ2BnBz71SRZuZg8AD7QZ9u2U1yfuRqxdojo2nFfpDec55wpUksTwH8CtsV5BQC1wfiaDyqY5C+uoGFzKsP4l2Q7FOeeyIskNbvOAgyWVxfcbMx5VljQ3G3MX1XLilOHZDsU557Km3cQg6Vwzu0PSl9oMB8DMfpbh2LrcG2s2U7d1h9cvOOcKWkdHDH3jc/8049Lej5DrqmLDedMmeGJwzhWujnpw+1V8+XczezJ1XKyAzjtVNbUM6VdMxeDSbIfinHNZk+Q+husTDst5VTW1TKvwjnmcc4WtozqGI4GjgKFt6hnKCPcl5JUVG+pZWlfPBUdPyHYozjmXVR3VMRQTbj7rSet6ho2EPhnySkv9greo6pwrdB3VMcwGZkuaaWaLujCmrKhaWEvf4iKmjExX1+6cc4UjyQ1uWyX9GNgf2HnXl5mdkLGosqCqppbDxg+kZ5F3zOOcK2xJtoK/BV4BJgBXAzWEBvLyxob6Hby6apPfv+CccyRLDIPN7DfADjObbWYXAu/McFxd6tlFdZh5+0jOOQfJTiXtiM8rJH2A0KdCt+iCc2+ZU1NLzx7i0LGeGJxzLkli+F5sQO/LhPsXyoAvZjSqLlZdU8sBo8vpU5x3V+E659xu6zAxSCoCJpnZ/cAG4PguiaoLbdvRxL+XbOD8oyuyHYpzznULHdYxmFkTkNWOczLthWUbaGhqptL7d3bOOSDZqaSnJP0SuBvY0jLQzJ7NWFRdaM7Clo55/Iok55yDZInhqPh8TcowA/LiPobqmlomDuvHoL7F2Q7FOee6hSQd9eRdvUKLpmajelEdpx40KtuhOOdct1HQt/m+unITm7Y1Ms3vX3DOuZ0KOjFULwr1C37Hs3PO7VLQiWHOwlpGlJUwZmCfbIfinHPdRpLKZyQdBVSkTm9mt2Uopi5hZlTV1DJ9wmDvmMc551J0mhgk3Q7sC8wDmuJgA3I6MSytq2fVxu1ev+Ccc20kOWKoBKaamWU6mK5UVeP1C845l06SOoYXgRGZDqSrVdXU0r+kJ/sN9455nHMuVZIjhiHAfElzgO0tA80sp5vKqKqpo3L8QIp6eP2Cc86lSpIYrsp0EF2tdksDC1Zv5kOHjs52KM451+0kufN5dlcE0pWqY/3C9Alev+Ccc211Wscg6Z2SqiRtltQgqUnSxq4ILlOqamopLurBgaPLsx2Kc851O0kqn38JnAO8DvQBPh2H5ayqmjoOHltOSS/vmMc559pKdOezmS0AisysycxuAd6d0agyaGtDIy8u2+DNbDvnXDuSVD5vlVQMzJP0I2AF0DezYWXOvCXraWw2pnticM65tJIcMXwyTncZoaOescBHMhlUJlUtrEOCw7zHNuecSyvJVUmLJPUBRprZ1V0QU0ZVL6pl8vD+lPfple1QnHOuW0pyVdIHCe0k/S2+P0TSrEwHlgmNTc08u6jOm8FwzrkOJDmVdBUwHVgPYGbzCC2t5pyXV2xiS0MT0/z+Beeca1eSxNBoZhsyHkkXmLOz4TyvX3DOufYkuSrpRUkfB4okTQI+DzyV2bAyo7qmljED+zCy3Dvmcc659iQ5Yrgc2J/QgN7vgI3AFUkWLulkSa9KWiDpyjTje0u6O45/RlJF8tB3T0vHPF6/4JxzHUtyVdJW4FvxkZikIuAG4L3AUqBK0iwzm58y2UVAnZlNlHQ28EPgrN35nKRq1m1l7eYGTwzOOdeJdhNDZ1ceJWh2ezqwwMzejMu7CzgdSE0Mp7Or9dY/AL+UpEx0ClS10OsXnHMuiY6OGI4ElhBOHz0D7G7HBaPj/C2WAke0N42ZNUraAAwG1qZOJOli4GKAcePG7WYYwYDSXrx36nD2Hdpvj+Z3zrlC0VFiGEE4DXQO8HHgL8DvzOylhMtOl0jaHgkkmQYzuwm4CaCysnKPjibet/8I3rd/3nVE55xze127lc+xwby/mdl5wDuBBcBjki5PuOylhOYzWowBlrc3jaSeQDlQm3D5zjnnMqDDymdJvYEPEI4aKoDrgD8lXHYVMEnSBGAZcDbhyCPVLOA84GngTOAfmahfcM45l1xHlc+3AgcAfwWuNrMXd2fBsc7gMuBBoAiYYWYvSboGqDazWcBvgNslLSAcKZy9h+Vwzjm3l6i9HXRJzYTWVKH1eX8BZmZlGY4trcrKSquurs7GRzvnXM6SNNfMKpNM2+4Rg5kl6sTHOedcfvGNv3POuVY8MTjnnGvFE4NzzrlW2q187q4krQEW7eHsQ2hzV3UBKLQyF1p5wctcKN5umceb2dAkE+ZcYng7JFUnrZXPF4VW5kIrL3iZC0VXltlPJTnnnGvFE4NzzrlWCi0x3JTtALKg0MpcaOUFL3Oh6LIyF1Qdg3POuc4V2hGDc865TnhicM4510pBJAZJJ0t6VdICSVdmO569RdIMSaslvZgybJCkhyW9Hp8HxuGSdF38Dp6XdFj2It9zksZKelTSy5JekvSFODxvyy2pRNIcSf+OZb46Dp8g6ZlY5rslFcfhveP7BXF8RTbj31OSiiQ9J+n++D7fy1sj6QVJ8yRVx2FZWa/zPjFIKgJuAN4PTAXOkTQ1u1HtNTOBk9sMuxJ4xMwmAY/E9xDKPyk+Lgb+t4ti3NsagS+b2RRCB1Kfi79nPpd7O3CCmR0MHAKcLOmdwA+Bn8cy1wEXxekvAurMbCLw8zhdLvoC8HLK+3wvL8DxZnZIyv0K2VmvzSyvH4S+qx9Mef8N4BvZjmsvlq8CeDHl/avAyPh6JPBqfP0r4Jx00+XyA/gzoQvagig3UAo8S+g/fS3QMw7fuZ4T+kA5Mr7uGadTtmPfzXKOIWwITwDuJzT3n7fljbHXAEPaDMvKep33RwzAaGBJyvulcVi+Gm5mKwDi87A4PO++h3jK4FDgGfK83PG0yjxgNfAw8Aaw3swa4ySp5dpZ5jh+AzC4ayN+264FvgY0x/eDye/yQuj35iFJcyVdHIdlZb3usGvPPKE0wwrxGt28+h4k9QP+CFxhZhuldMULk6YZlnPlNrMm4BBJA4B7gSnpJovPOV1mSacCq81srqR3twxOM2lelDfF0Wa2XNIw4GFJr3QwbUbLXAhHDEuBsSnvxwDLsxRLV1glaSRAfF4dh+fN9yCpFyEp/NbMWvogz/tyA5jZeuAxQv3KAEktO3ep5dpZ5ji+nNB1bq44GjhNUg1wF+F00rXkb3kBMLPl8Xk1IflPJ0vrdSEkhipgUryioZjQr/SsLMeUSbOA8+Lr8wjn4FuGfypezfBOYEPLIWouUTg0+A3wspn9LGVU3pZb0tB4pICkPsCJhErZR4Ez42Rty9zyXZwJ/MPiiehcYGbfMLMxZlZB+L/+w8w+QZ6WF0BSX0n9W14D7wNeJFvrdbYrXLqoUucU4DXCedlvZTuevViu3wErgB2EPYiLCOdWHwFej8+D4rQiXJ31BvACUJnt+PewzMcQDpmfB+bFxyn5XG7gIOC5WOYXgW/H4fsAc4AFwO+B3nF4SXy/II7fJ9tleBtlfzdwf76XN5bt3/HxUst2KlvrtTeJ4ZxzrpVCOJXknHNuN3hicM4514onBuecc614YnDOOdeKJwbnnHOteGJwXUqSSfppyvuvSLpqLy17pqQzO5/ybX/OR2Prro8mjUfSzdlovFHSpZI+1dWf63JbITSJ4bqX7cCHJX3fzNZmO5gWkoosNDuRxEXAZ82sw8SQysw+vWeRvT1mdmM2PtflNj9icF2tkdB37Rfbjmi7xy9pc3x+t6TZku6R9JqkH0j6ROyj4AVJ+6Ys5kRJ/4zTnRrnL5L0Y0lVse36S1KW+6ikOwk3CbWN55y4/Bcl/TAO+zbhJrsbJf24zfSS9EtJ8yX9hV0NniHpMUmVLeWS9MPYWNrfJU2P49+UdFqCmB+T9AdJr0j6bbwbnPi9zI/T/yQOu0rSV+LrQyT9K46/V7va9n8sxjMnfm/HxuH7x2Hz4jyTEv/KLqf5EYPLhhuA5yX9aDfmOZjQcFwt8CZws5lNV+io53LgijhdBfAuYF/gUUkTgU8RmgyYJqk38KSkh+L004EDzGxh6odJGkVo1/9wQtv/D0k6w8yukXQC8BUzq24T44eAycCBwHBgPjAjTVn6Ao+Z2dcl3Qt8j9B0+FTgVkJzBxd1EPOhwP6EtnGeBI6WND9+/jvMzFqa0GjjNuByM5st6RrgOynfW8/4fZ4Sh58IXAr8wsx+q9CcTFGaZbo85EcMrsuZ2UbCRurzuzFblZmtMLPthGYAWjaSLxCSQYt7zKzZzF4nJJB3ENqd+ZRCs9XPEJoZaNn7ndM2KUTTCBvvNRaacv4tcFwnMR4H/M7Mmiw0iPaPdqZrAP6WEv9sM9vRpiydxbzUzJoJTYJUABuBbcDNkj4MbE39QEnlwAAzmx0H3dqmPC2NEc5NieFp4JuSvg6MN7P6Tsrv8oQnBpct1xL2ivumDGskrpPx9EhxyrjtKa+bU9430/rIt20bL0ZoV+ZyCz1jHWJmE8ysJbFsaSe+dtvx7kSSNmZ22K62aHaWJW7oW8rSUcyp30UTYW+/kXD080fgDHYlnqRaltnUEoOZ3QmcBtQDD8YjJVcAPDG4rDCzWuAednXPCKEHq8Pj69OBXnuw6I9K6hHrHfYh9Gz1IPAfCs11I2k/hRYsO/IM8C5JQxS6hz0HmN3JPI8DZ8f6gZHA8XsQf4vdilmhf4pyM3uAcHrokNTxZrYBqGupPwA+SSflkbQP8KaZXUc4vXXQnhbG5RavY3DZ9FPgspT3vwb+LGkOoSXJ9vbmO/IqYYM3HLjUzLZJuplweuTZeCSyhrBX3S4zWyHpG4SmngU8YGZ/7mgeQhv6JxBOCb1G54mkI7sbc3/Cd1cS431L5T6h2eYbJZUSTrNd0EkMZwHnStoBrASu2a0SuJzlras655xrxU8lOeeca8UTg3POuVY8MTjnnGvFE4NzzrlWPDE455xrxRODc865VjwxOOeca+X/A9WhtM17yDn4AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "def run_d_n(dim,N_pts,L):\n",
    "    pts=np.random.rand(N_pts,dim)-0.5 # simulate N_pts points on dim dimensions space\n",
    "    ratio_list=[]\n",
    "    for i in range(N_pts):\n",
    "        # ignore the data point itself\n",
    "        selected_pts=np.array([j for j in range(N_pts) if j!=i])\n",
    "        # calculate the L2 or L1 distance with other points\n",
    "        dist=np.linalg.norm(pts[selected_pts]-pts[i],L,axis=1)\n",
    "        #print(\"dist is: \",dist)\n",
    "        # calculate the ratio of the min. distance to the max. distance\n",
    "        ratio=np.min(dist)/np.max(dist)\n",
    "        ratio_list.append(ratio)\n",
    "    # output the mean ratio\n",
    "    return np.mean(ratio_list)\n",
    "\n",
    "# Initialise the N_pts, the number of points we simulate\n",
    "N_pts=1000\n",
    "# Setting l=2 to calculate the L2 distance\n",
    "l=1\n",
    "# Setting the number of dimensions we simulate\n",
    "check_dim=range(1,550,50)\n",
    "# Calculate the mean ratio on that dimension\n",
    "ratio_list=[ run_d_n(dim,N_pts,l) for dim in check_dim]\n",
    "# Plot the ratio with its corresponding dimension\n",
    "plt.plot(check_dim,ratio_list)\n",
    "plt.ylabel(\"Mean ratio of min/max pairwise distances\")\n",
    "plt.xlabel(\"Number of dimensions\")\n",
    "plt.title(\"Effect of increasing dimensionality on pairwise distances\")\n",
    "plt.xticks(np.arange(0, 600, step=100))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question:** how can this plot be interpreted ? How else could you visualize this effect ?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. The average range of pairwise distances increase as the number of dimensions increases, curse of dimentionality. Moreover, the ratio of max and min approaches a contant, in other word, max and min distance are approximate the same, so we are losing distinction power.\n",
    "\n",
    "2. Something this plot doesnt show is the actual distnaces. To see this you could try plotting histogram of the distribution of all pairwise distnaces for low and high dimensionality and compare the result."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2) Implement Nearest Neighbour from scratch\n",
    "\n",
    "The following will give some practise in implementing a simple classifier, the $k$-Nearest Neighbour ($k$NN) algorithm. It should help us to write a $k$NN package from scratch. Most machine learning methods include two main steps, namely training (fitting to a model to the training data) and prediction (running the model on input data  to generate output). However, in the $k$NN algorithm, since there is no explicit model-building step, we only require implementation of the prediction step without a training step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Creation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(500, 2) (500, 2)\n"
     ]
    }
   ],
   "source": [
    "mean_01 = np.array([1, 0.5])\n",
    "cov_01 = np.array([[1, 0.1], [0.1, 1.2]])\n",
    "\n",
    "mean_02 = np.array([4, 5])\n",
    "cov_02 = np.array([[1, 0.1], [0.1, 1.2]])\n",
    "\n",
    "dist_01 = np.random.multivariate_normal(mean_01, cov_01, 500)\n",
    "dist_02 = np.random.multivariate_normal(mean_02, cov_02, 500)\n",
    "print(dist_01.shape, dist_02.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have created two 2-dimensional normal distributions of data points with the same covariance but different means."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plotting the created Data "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What does the data look like ? Notice the 2 unique clusters being formed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXkAAAD8CAYAAACSCdTiAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJztnX+YHGWV77+nO53QE3QmgeCSyUSilw1XSSBmVrOL111FCS6/RtwN4i+8spvrs/6MbiTofSC47CWavYB7r/sjF9C9D4hkMUT8GRT0XuXZcJk4SSBCFhVJMkEZfkyUpMn0zJz7R3XNVFfXW/VWd/V0V/X38zw8k66ueuvtnuH7njrnvOeIqoIQQkg2ybV6AoQQQpoHRZ4QQjIMRZ4QQjIMRZ4QQjIMRZ4QQjIMRZ4QQjKMtciLyG0i8oyIPOo5Nl9Evi8iT1R+zmvONAkhhNRDHEv+KwDO9x3bAOB+VT0dwP2V14QQQtoEibMZSkROA/AtVT2z8no/gD9R1adF5FQAP1LVpc2YKCGEkPjMavD6V6jq0wBQEfpTTCeKyFoAawFg7ty5K88444wGb00IIZ3Frl27nlXVBXGuaVTkrVHVLQC2AEB/f78ODg7O1K0JISQTiMhTca9pNLvmNxU3DSo/n2lwPEIIIQnSqMjfC+CKyr+vAPCNBscjhBCSIHFSKO8E8G8AlorIIRG5EsAmAG8TkScAvK3ymhBCSJtg7ZNX1csNb52b0FwIIYQkDHe8EkJIhqHIE0JIhqHIE0JIhqHIE0JIhqHIE0JIhqHIE0JIhqHIE0JIhqHIE0JIhqHIE0JIhqHIE0JIhqHIE0JIhqHIE0JIhqHIE0JIhqHIE0JIhqHIE0JIhqHIE0JIhqHIE0JIhqHIE5JG9m4FbjoT2Njj/Ny7tdUzIm2Kdfs/QkibsHcr8M2PAeWS8/rIQec1ACxf07p5kbaEljwhaeP+z00LvEu55BwnxEciIi8i60Rkn4g8KiJ3isgJSYxLCAngyKF4x0lH07DIi0gvgI8B6FfVMwHkAbyr0XEJIQa6F8U7TjqapNw1swAURWQWgC4AhxMalxDi59xrgEKx+lh+NjB2lIFYUkPDIq+qwwD+DsABAE8DOKKq9/nPE5G1IjIoIoMjIyON3paQzmX5GuCivwe6+wAIUJwPqAKl5wHodCCWQk+QjLtmHoBLACwBsBDAXBF5r/88Vd2iqv2q2r9gwYJGb0tIZ7N8DbDuUWDjKDB7LjBZrn6fgVhSIQl3zVsBPKmqI6paBrANwB8lMC4h2STpHPekA7HMwc8USYj8AQCrRKRLRATAuQAeS2BcQmaOOMLWiAi6Oe5HDiIx14op4Cq5+OM2Y36kpSThk38IwN0AfgrgkcqYWxodl5AZI46wNSqCzchxDwrEAoBOVM/NZnFiDn7mSCS7RlWvVdUzVPVMVX2fqh5PYlxCZoQ4wtaoCNbjWokSZzcQK/naa9252S5OzMHPHNzxSohR2A7WCmujIhg3x91WnJevAXTSMLeD9osTc/AzB0WekFAB8wlroyIY5FopFJ3jQcR5cgibw5GDhuO+xSnQ9SPA6eeZxyZtDUWeEJNP24srrFEibeNaOevd064VyTuvTYXF4jw5nHsNAAn/HH78C4M7v6pxFNjzVQZfUwpFnhD/5iITRw7Vntvd57xevsbOtbJ3qyOYOuG81olwAY3z5LB8jXNfayT4CeKJ+2rHYfA1tVDkCQGqNxd19wWf4wqr99x1j05b4TaulbiB27juHdPcA9HgJ4iwGAWt+dRBkSfET1xhdbFxrcQN3IY9OQQRx2UTtZgFwZz51MGmIYT48VrmRw45onfuNdENOboXBQc4JQds7K744Q3uFFdY924Nvq9tM5Dla4ADO4HB28z3AsIXrXOvqW5K4sV96mBzktQgqnF8eMnQ39+vg4ODM35fQpqKv2OTLYWiY50Dtde778UVVf9icfp5jq/ddtHauxXY9peGN8VxVZEZR0R2qWp/nGtoyRPix2RNR+F/ApDcdIDVRHff9Pg3nWn218cR+Xrn7/8s372qUtnSP2fmzKcJijwhXhrtn+p1rWzsiT5/3aPT/05it2lS/V/3bgWO/672eH52dbpoo4sJaToMvBLiJcnaLTYWrzevXgwB07DdsP6c/KTmf//nassXA8DsE+3TRUlbQEueEC9J1m4JC2ACQGEusP2vpsU0KD7mWs5BPvY9X6212E33ijt/0/mlF5yfYYuJa83T0m8LaMkT4qU4z/CGxi8r7KY/FufXvper7HgNspa9zD7R+em3mgdvCxZZMfwvbfxcBkznu08VUYshLf22gZY8STcma9HGivSfM/9VwYFGlyD/tncMVxhLL9Te02uxA8CkApNHoz9f6YVgq9mUHmkqUhYHG3+8KV3UXQRsLH0yI1DkSXoxBRkP7Ax2ZQDV4uy/1lTEy4tXqPxjeBeIIweBbWuduTxxX4DFbinG3YuSKfPrullsiPLHA8GuKG/uPUsWtw1015D0YrIWd32lvvICtrhCFTmGAoO32i0eQbiiaZuyWCgGu4YA+zH2bjXP17tQRO3EZcnitoGWPEkvJqvQlJtuU17Ahim/dJ3ibXWPPie4ev/nwu8jOSdg67qHgHALOwz3ycREcZ4Tl/C6wLwpoF6iLH0yY1DkSXoxlhHIBwu914o0XRuFK1R7t8KpEdOkHeNucDVqfJ0E+q8ELrxx+tiBnc7TjE5ElzL2EvZkkp/t+Oldl1RU/n29pSFI4tBdQ9KFNzd87KgjPl4KRWDlB6ILjNnUkA9Cco6I3vMhNE3gp7Acf/C26j6uQaWMv/VJc5179zsNW/Rmn1jrpy+XnNIHpqwjf7VOoP4G6KRuWLuGpIeg2jC5AjDnZbUZLfVk13jru9iUJGgnuvscITWKte+pw62JY1PMrLuv4t6KKHgWVmMn6HdXb12eZpGCvP56atckIvIi0gPgFgBnwvlL+KCq/pvpfIo8qQuTgLkCZ4NtymUz/e3Norsv3rwlF51y6QpxVGzAvb/p95DE766ZpGERQmsLlH0RwPdU9c9EZDaAroTGJVmkXovJZgNO2LhxUi6b6W9vFnHnHSXw3uJpQHSFzbBgdrunVGY4r79hn7yIvBzAmwDcCgCqOqaqrENKgmlkJ2RYWp7NuKb/kQdvNWw2itkvtanYziVI4Ov4HF4L+6YznZz/WSEpmkB4emS7p1S2+yLUAEkEXl8FYATAl0VkSERuEZG5/pNEZK2IDIrI4MjISAK3JamkkQJaYR2bbMaN/T+shouan1xhukF3kkjOmYupZEEY3X1A/wdjBpllOoPIu3CWngfGS042j+n3YGpkXm+3rZmi3RehBkhC5GcBeB2Af1TVFQCOAtjgP0lVt6hqv6r2L1iwIIHbklTSiMUUtgHHZty4/8N29wFXPWkv9K97fzJlBfy4Y9Yz9ovPAPvuibnxS53F8btXhW82cxc09/cAmJ+m4rYxnGnafRFqgCR88ocAHFLVhyqv70aAyBMCILrmSRSmVng240ZVhfTi/R/ctiTA4K12580kE8eB0vH414UFWb3pme73ZNP0JE4bw5kmw3n9DVvyqvprAAdFZGnl0LkAftbouCSjNMtishnXtSajXCq2W/RJtUss7X5tf15/BgQeSG4z1EcB3CEiewGcDeC/JTQuyRrNemy3HXf5GuAd/wRjMNINOHqvO/ea2k1XZBpXxDPs104z3AxFsklUOuXGbsOFvibVe7eae51ak8J0zDi4C2OczWqkLurJk2dZA5I9bNIpu/uCr/Vane44DQk8kGmB97rE/E9TxfkAJivfX+X38I0Ps5zBDEORJ9nDJp3SxoffSDniTiDIJeb1awPApK80xMQYcM9/Yf2aGYRVKElT2T40jM079uPwaAkLe4pYv3opBlb0NvemNgHAqmyKg04w1l0I3EYfaSxtkCQ2ZQ8O7DS7xUxPQO6YUZUsSSLQkidNY/vQMK7e9giGR0tQAMOjJVy97RFsHxpu7o1tA4DL10xb9G5a4JGDjTX6SBNRWUY6idDdslXfVcUds+0vgc8vsbfQbTfCkbqhyJOmsXnHfpTK1Y/rpfIENu/YbzeAafdkFHHSNDvZJaMTiC55UEd5h9LzjoVeqNn4HkxaUixTCkWeNI3Do8HiaTpeRSM1buKkaXa8wNgEhdUcqDZRLgGz5jjZNVEwxbKpUORJ01jYE1wvxXS8ikZq3AD2G1s6QWByeTuxNeGmSMYV+tILwMA/VGfb+OeRkdIB7QxFnjSN9auXolio9vsWC3msX73UcIWHRndPhrl6/N2lGhHANDA54eSq18vYUec7i9tNq3tR9WJ71ZPVot9u9WsyCrNrSNNws2jqyq6xrXETtOkJCK4b7+J9r/R8ZTdrxjcslZ439761ufYbHwYu+ZKngYinm9a+e2ozaUwWejvXr8ko3PFK2pOwTj2Ap1NRQFu7WcXg9D3X3RC0eBTnO2V0OzUIa4PkgHf8c7BIf+uTwK4vT6dHFuYCF91MQU+YVnaGIiRZTFUBAZ/4+4yUcohQh6VFll4ALt1i1+auU9HJ6rz2qaeogO+rfBTY/lfT55KWQUuepAtjo+oEcNvdUejDkTyw8gPVLRNNtEsP14xAS55kH5vAa72ulyMHHetzslzf3DoFnQAGb4NVDKPjU1RbD7NrSLqISnksFIG3f96TJx8TCrwllh6ATkhRbXMo8iRdBKbxVXZkelPy3NQ9k9DH6d1K6iNXYA58G0B3DUkXUW3a3Bx4t+jY1NZ9j+WZnw0c/91MzzyDhKSdFuc7T1Te30sGW+ulAYo8SR+mXGt/2uVUTrhbf6WyPX/saAI14tudJuf9F4rAWe+uVOuMEG7/74XVJ2cUijzJDqHFxnQ602Njz4xOqzVUPm/QXoIkiLNTNaxEBUW+6dAnT7JDVCaHmxbZScHAS/+Xk/8/FZuIUVHSVIq4uy+eOKe9wXfKociT7BAp3lJfDZa0cuQgsG2tU+MdmBZ8m6BzfraTC29bsjkMNvhuKYmJvIjkRWRIRL6V1JiExCJSvHXaReAvRbzkj2dqljNMxU3j9YPPtqjzrgosXmVfsjmMOPX9SeIk6ZP/OIDHALw8wTEJscPN3ojaAOW6CIKCt59fku2ArOsHt3GTTJadJ4DuPqcImRtgdUs9xxH6qIwo0lQSEXkRWQTgAgB/C+CTSYxJOpR6Uu2CipmZCHMRvP3z9uOklbjlGtwWf97X9WTGsPpky0jKXXMzgE8DMHb9FZG1IjIoIoMjIyMJ3ZZkinq7QcVp4efWRg+i4sYZl4J9Lopti7sswb6sqaJhkReRCwE8o6q7ws5T1S2q2q+q/QsWLGj0tiSL1NsNKk6Whtt/1CD0v/k/tyI/WbbLQenuc8rpduLuWWbGpIYkLPlzAFwsIr8C8DUAbxGR2xMYl3Qa9abaxc3SCFk4TnluJyRA4Wss+0LR8VV/82Pp9+N39zmZN7GuYWZMWmhY5FX1alVdpKqnAXgXgAdU9b0Nz4x0HvWm2p1+HmLlfwPmhcPgp6kZfWLC8VVnwX9/5JDjqrIt6MbMmFTBPHnSPtSTard3q1PXPO6OTtPCYbtWTI7Fu187434XNotlcT77sqaMRMsaqOqPAPwoyTFJB1FPqp0p6Cp5p5NRcZ5TjMxbQjhk4XjmpFU45dlgl01DFOcD48edjklthTgB7s8vAcZeRM1iWZgLlI8x7THFsHYNaS/iptoZ3S6TwMZR598x0jJf8dEd+M3/WI1TntsJKBIQe3F2mbr329iDljYM77+ykvPuq2ljiit0zQfWHZ6p2ZEmQJEn6aZ7UXDut9cdE7RwhAj/Kz66Y/qcez7kqWZZB4Uuu/kCjrXf7CCum/Muuemm22Ewiyb10CdP0k0cP75ba35jt1PTJSoff/ka4B3/1Nj8ykerxzY1Pem/ErjqyfhZLlXD5JyxbHL3bQQecBalb30SuG6+871dN995TVIDRZ6km6A6NEGBwaqNVkCNy6Rccqz2IKFvNA/em7IZNN9LtwAX3ui838gmI8k7Y3UllLdfKALzX+VY/+7TjFayiij0qUFUZ94/2N/fr4ODgzN+X9Ictg8NY/OO/Tg8WsLCniLWr16KgRW9rZ5WNW63qCgKxdpFwlTTpjAXwKR9GmV3X208wO82ilt2IOgeRw6hLr9/rgDMeRlQemHahWVyV0keuDbl+wNSiIjsUtX+ONfQJ08aYvvQMK7e9ghKZUcIhkdLuHrbIwDQXkJv61sOamZResFw7jHHcv7uVXa+dG/pX7fw156vVndMimrwUSiGLypxFwvJORUnTQFpt0yxn0biFGRGobuGRLJ9aBjnbHoASzZ8G+dsegDbh4an3tu8Y/+UwLuUyhPYvGN/XeM1jTg7NP0LQtgmreVr7Er3TuEp/Tt4W4Bghwi8m6MetmnJFWvbevmui2fdo8EZR6bGIabjpO2gyJNQXEt9eLQExbSl7grz4dFgq9J0PGq8phFH+IrzKgHaHufn6eeFB3frzkCJ4VLJFaYbY6971AnQmuYU5PfvvxKBG50my+FxgJUfiHectB1015BQwiz1gRW9WNhTxHCAoC/sCRZU03if2roHgJ2Lp64YQNBGK7+7BHDEdOzFaffLkYPOOWFNq03uEckn59Z43furLe2ojWNBaaODtwWPHbZIuQHhXV9xPovkHYF3j5O2hyJPqvALaJCAA9OW+vrVS6t88gBQLOSxfvXS0Ov8TKha+fIbigEECd/iVdVCOXa01r9eLgF7v2bOsjn3mto69IWiszD4F5EaLJtsP3Gf8zNqY1fY+zZ7CoK48EaKeoqhu4ZMEeRKMW34dC31gRW9uOHSZejtKUIA9PYUccOly4yCa7LwgWhfPlBfDCAU1/2xcRRY9yjUFGQdO2rOqzelcV54o8+H7vs2C0Wg/4PT14X5uY8ciq63H/U+2/B1JLTkyRRBAqqotTX9lror6O4TgCu4XqF3nxDchcNku7qWvsklEzcGEIftQ8P4Az0JvfJs9Mn+LBxTOQbvcRsrfNtaBH473YvC6+0vXxP9/vI1wIGd1a6Xs97NejQZhyJPpjAJpcKx0E0+8CgXiv/9MOdETgT/dfsj+Pqu4cDxTC6knAi2Dw03lLa5ecd+rCyvwRcL/2BXsyZuwDWqLo8rwoO3oepbcq3tbWvD5xFVj9+t2Ond2LTnq47LikKfWeiuIVOYXCm9PUU8uOEteHLTBXhww1tqhDTKhRL0PhBc1HZCFXfsPGAMzppcSK5Pv5EsncOjJdw7+UY8ryfaXdCMxhkX3uikNAbt4I2qtx/1fr2dt0iqoSXfYfjdIG8+YwF++PgIDo+W0F0soJAXlCemrciwIKpLlAsl7AkhyHVjsvQnKruzTe97s37qwX1KuG78/dhUuAVdMl0zXv0VKZP0ZQe5cdY9WnueKcDrziPq/Xo7b5FUQ0u+gwgKrN6+88DU69FSGVBgXlfBKojqYnoCcI+HBVuTLqpRj2/e3ZzlPiXcO/lGbCj/BQ5NnoxJFRyaPBn/e+KtU6/R3YeHl12Hc75zcuMbuuI0L4+q0xP1fr2dt0iqYe2aDsIVsihc94wtfp874DwBuAtE0PvNIom5hwWGeysxibDPGwtTTZ3uvmBrvhHcBcVv6bPTU2pg7RoSiq2VG9cadoVt4737nKcBACcUcjXvf+Ku3bHGjSIo6+fNZyyYWszyIphQnRLmIAE2ZRT1FAs4Pj4ZmP8ftUHMSJBbZiZdKPV03iKph+6aDiLMbRJ2nm2tmePj0zXKXzhWrgqEDqzoRU+xEHhdVyGHXsu5uRQLebxn1eKq/Px3ruzF13cNTz2tuD78sNIJpgXtSKlszP+vK43T5JYpzgs+v1kuFN++AAp89qEln1GC8syD3Ax+/IFW2x2mNtZteSK4UYWI4MENbwl1JxULOZxQyOOFY86TwpxZOfS/cj6uH1g2dc45mx4wfjY3O2fdXbur0kDDyjIMrOgNtMzjlnIAYM5smVWsrSzJDUokQRq25EWkT0R+KCKPicg+Efl4EhMj9WMqAgYA71zZW5OC6L52reHNO/ZPWe0b791ntcPUZMUOj5amxjo6FizA7vEwS/idKxfhpfL0IjFaKmPdXbtxmufpIsrNdIH8GD+e/TH8uPQO/MH2N+Hhe/8Z61cvRbFQvdPU6/bxP71sHxrG0ePjNWNHZiGZ3C+lF+yanhBSJw0HXkXkVACnqupPReRlAHYBGFDVn5muYeC1uZgsYtclYnrPxtJ3EQBPbrog8p62/GrTBcYxioUc5s+dEzp+sZDHCYXclKXv5+LcT2rSIo/pbOxbeT2G+y6sSSv1bsZyx3fdQf7vZ+7sPAr5HI6UyuaCaTMZYPUTo5E5aW9aEnhV1acBPF359+9E5DEAvQCMIk+aSz0+Y7ccgW0GjOuasC1XEEZPsWC0kAFgfFIjF5BSeSJ07p+etbVK4AGgS8awcNcXMNx3YVVGzorP3Rf49HLnQwen/PyAs3B8etZWLJRncXjiZHwhtwb3jr4xuGBaVA57s/Bn1LixAIBC3yEkGngVkdMArADwUMB7a0VkUEQGR0ZGkrwt8dHTFRzg7OkqhOa022bVuK4Jr1sImN7cFIccgAvPOhVXb3tkKjPHT3lCkbeqM2BmoaEezal4rsr1tH1o2Pg04Bf4TYVbsCj3LHICLMo9i02FW3Bx7ifBBdNse9EmDXe5djyJibyInAjg6wA+oaq/9b+vqltUtV9V+xcsWJDUbUkAJg+cKow+6PWrlxoXgHldhcAsE1P6YRwmAdz50MHIJ4iJBt2Kh/Vkw/GTqhY322qWpieDT89yNjEFLpityGzhLteOJ5HsGhEpwBH4O1R1WxJjkvo5YrCIj5TKNRUj/T7koE0+11702sAskyQqPwKNC7gNXxhfE+iT/8L4mqrFzfYzmZ4MFspzzs+YKaFNo94a8iQzNCzyIiIAbgXwmKqys0AbEJXiZ0oNDFsAvL53d5OR+9NPsZDDS+XJxEsWNMK9k28Eyqj40J/DYT0JXxhfg3sn34h5Y+NTFSx7ugpGd42Xw3oyFgUI/WE9yarez4zRqlgAaRuSyK55I4AfA3gEztM3AHxGVb9juobZNc0lbKt+2O5P/xiu2Pd0FfDiS+MoT0b/rRRyAgiqipylATd75q7/d9Dqc16c+wn+fu6Xq8TzmM7G38iH8L3cf8LosZBMm5mG2TWZoZ7sGtauaRF19SmtY/ygrJeoOiv11prp7Sni2Ni4lSXcjpieTEx84fcfx5ojX54Sz4df/VG8/+FXJlPThpAAKPIpIaqgl+maehaFsJx5UyGvenLe3fmvu2t3W7lpmkleBL+44U+nXod9b7ZPUISEwQJlKSFugatGmleH5czHbbEXhls2wNan3Y7EteT954Z9b7EajhOSICxQ1gLiblZqpHm1Kcujp6uA9XfvqSp9sP7uPdg+NFx3ZsiEKl58aRyFfGM57a1A4Mw/zsz9uftR31tDDccJqROKfAuIarLhp5Hm1aa8+OPliZrgaHlC8Ym7djdUnqA8qZg7e1bsqpKtxBuz8G7o6u0p4r2rFiNnUP7L39BX9Trou/aTVNopIbZQ5FtA2IakIOIuCi6uO6ZUnpiyOt3NTMfKwRUhk2C0VMb61UuNpYXbjaD2g27M4vqBZbhxzdno8tTHFwHeu2pxVQVMwHHDuOWJTbRN/jzpGCjyLcArBjZt9uIuCgBqSg5MqE5dMxM+4au3PYKx8eZ3gmoWrsU9vVBOoreniJsvOxtP3nBBjcC7DKzoxYMb3oKbLzs79u+MkGbAwGuLMG1IMp0LmHepBhEV3BUxlz9Igplo9Qc4deUnVRPPy1/YU7QOeIdlPjUzTZYQGyjyKSHOogBE+/FbkDnbFLzdqJLkzWcssMqCiloIKOqk1dBdk1FMvt/uYgHnbHpghmeTPu586KAxAO09bttUhZBWQZHPKEF+/EJOcHRsvKHsmU4hLF/eDWJvHxo2lkfmd0zaBbpr2owg/y4Q37cb5BNOc8mBdsJdAMKs9Ubr3xOSFBT5NiLIv/uJu3ZXnRNn56TfJ7xkw7et5lHICU48YdZUka03n7EAd+w80DHlCqJwUyTDct4nVAMrd0aVN2h2TSPSedBd00bYtt+r1+drk6OdF8HmPz8LQ9ech5suOxsAcMfOA8iaYfqKl82eSmHtKRYwr6sAQbQF7k2DDPs+BZjaUQxMW//uIu02BvdiasAedC4httCSbyPi7IYMOjfMCgzroeplUnWqfrz3qSIr2Tguv/ndWOCGpjhlmtevXor1/7onsDSxwlxu2VSnKG5NI0JsoMi3EaZmH6ZzvYSl8gG1HZ/Cxt0+NIxPbd0zIx2bWslXHzoQuGsVsIuBuMc23rvPGIA1EbRIN1K+ghATFPk2Yv3qpVZiHLRzMqqIme3mpNFjY1j/r9kXeAAw9QaJu1FtYEUvlmz4dqyYRZCrJ6qjFyH1QJFvASa3iiss131znzELJi+Cd66sFaGkrMCjY+ktRdAMbAOhcZ7CTOUNghZ5lkIgjcLA6wwTFVwbWNGLay96rbFc74Qqvr5ruCYYF1ZSmJZgMMVC+J9/nEDo+tVLa35nOQHmdTlF2vwF4kzunzg1jQixgZ2hZhibTk02nZn8nZ22Dw1j/d17aoJ9hZzgstf34fadBxKYfXbIAXj3qsX44eMjRis96nfltfK7iwX89qVylQuokHMylSjSJCnq6QxFS36GsXGr2LhY/OcMrOjF3Nm13rfypOKHj49MWZSdTF5kykL+w1fPxx07D4Ra6VFdtbxW/mipXOPjL08qyxuQlpOIyIvI+SKyX0R+LiIbkhgzq9jUhrdxrwSdc8SQ4XF4tIRrL3ptZEOLrDOhOuU7f/AXz9cESv37D8J+V7Z7GpgZQ1pNwyIvInkAXwLwdgCvAXC5iLym0XGzik1t+KgOQ6ZgXJgouf7etDTyaAaC6Joy3vfDfle24s14CGk1SVjyrwfwc1X9paqOAfgagEsSGDeT2ATX/OfM6yqgp1iIDMZFLSADK3oxd07nJlTZRp+8QXDT78pGvJkZQ9qBhgOvIvJnAM5X1b+ovH4fgDeo6kd8560FsBYAFi9evPKpp55q6L4kGH8wUARTNWggU8ZHAAANt0lEQVTWr16KdXftZg2aCIqFfGRWy/ah4dDvMqpGDSH10KrAa1CuX83fvqpuUdV+Ve1fsGBBArclQbjt52667GwcH5/EC8fKU4HFdXftRtfszvbL22BTG2hgRS/es2pxzR9/sZDHzZedjQc3vIUCT9qCJJ7dDwHwtq1fBOBwAuN2JElVIQwKDCqczU6FvCTeLi9r2Pjcrx9Yhv5XzmfVSNLWJCHyDwM4XUSWABgG8C4A705g3I7DtqeoDWEiNXf2LMydM2vKpTM2PoFj5ea00UsrtgFTf60b9wmAQk/ahYZFXlXHReQjAHYAyAO4TVX3NTyzDiTJKoRh2+yPlMrYfe15Vcc6pSiZDYWcWAdMk1yYCWkGieTJq+p3VPX3VfXVqvq3SYzZiSRZhXD96qWBwRIg2EodWNGLSQo8AODEE2ZZC3RUYThCWg13vLYRNhulbAkLDJqsVOZ0O4zGaJHI8sCk3aHItxE2G6XicP3AMtx02dnWBa/CrP9OIs5il+TCTEgz6NydMW2CP5vmnSt7Q4tmxSVubfTBp57v6H6ucRdVlgcm7Q5FvoUEBe2+vms41NpudqNnf1pgmsT+vasWW1fbdFv69QRsGIvzfcbpJEVIK2Cp4RZiU3bYS1D/UZvdmfWSZMbN6afMxRPPHE1gVmZ6ioVYbfhM37OXZi+qhMShnh2vtORbSNygnW2KZRLC5C4oSQh8b08RvxipX+BtN28l0WfVC9MjSRZg4LWFxA3a2SwKcboZhWFbStdPLiBye/T4uLGfahRzZ+ftK4vFJCo4yvRIkgUo8i0kbjaNzaJw3Tf3JSJMNimAPcUC5nVNV8e8+bKzceOas2salMS1sL0cHZtAud4VIoLh0RLO2fSAcQFkeiTJAnTXtJC4QbuoTI7tQ8PGBuBhwhTk3olqTF0s5LHx4tcGznXzjv3GeSRJXgSTlUYgx8bGA++ZFwl1OYW5YEzfAdMjSZqgJd9i3KqRT266ILJyYVQt+jBr3SRMJvfOm89YENq4JOzpIKoxR1K4nZ7Wr16KC5afGnjOqlfNi+yIZfosSe9bIKQV0JJPGWF572HWukmYTH7nHz4+ghsuXYbNO/YbRTvoftuHhqfSE2cCd1GaMyvYXvnVc6WpzxGWEhr0Wdzv+bpv7pt6SjDdh5B2hX+xGcJkrfcUC7EXhsOjpamnjF7DuArU+LQ379gfS+CT2GFbKk8Y/f7ez/HkpguMnyXMBfOSp0LnaKlcVyCbkFZBkc8QJvfCxotfa7zGJpgb1nPWn70TNyj5nlWLm9pg3P/54rpgmGFD0g5FPkPY9I/1c9pJwSLvPe4dNwiv6MUJSvYUC7h+YFnNnAuGv8q4Vn+QeMf9jphhQ9IOffIZI06tGgDY+csXrI674y7Z8O1Ad4wrekEZQEEUcjL1hOGd8/ahYay/ew+CvPo2biBvxo0pUynOd8QMG5J2KPIdjim90HTcRvROKOQCRd4NyIY1ud68Y39DrQknVPGrTRfUfb0fFiAjaYci3yGYSh2Y8sjzEuwcCRO9oNo6XlyB99eL8c4tiaycczY9kFiNmaC9DG8+YwE279iPdXftZj0b0vZQ5DuAsBosl7+hL7By4+Vv6Ks55jJn1rSlPq+rgGsvcjZFnbPpgUg3zeHRUpWo93QV8OJL44nuavV+PqDxCpF+dxLr2ZA0QZHvAMIyRFyr+s6HDmJCFXkRXP6GPvS/cj7O2fRAlTgCqLHUvemFNsHI7mKhaox6d8ZG7WQtlSfwmW17oZBEBTnJPryEzAQU+Q4gKkPk+oFluH5g2dTxIGt13V27A10pXoHrjij1WyzkIYK6Cp95cZ8eogK8xzwLUNB864HZNiRtNJRCKSKbReRxEdkrIveISE9SEyPJEbfaZZC1GuZMGR4t4ezr7sPvjo8bz8mL4IZLl1n3Tw2qZuny4kvOfW64dBl6igXziQYaEWS2+yNpo9E8+e8DOFNVlwP4dwBXNz4lkjRxNwDVI4KjpTImQvzqk6oYWNFrJYaFvISuKuVJnbLG586J/zDaiCCzng1JGw2JvKrep6qu+bYTwKLGp0SSJu4GoGZYpe6YQSJZyIvThq8yt7mzZ6HW0VLNcCWAG3dBalSQ69lwRkgrSaz9n4h8E8Bdqnq74f21ANYCwOLFi1c+9dRTidyXJE9UKmRc/C0KozpXmTZcBY17QiFnHbzNi+C/rzmLgkxSS1Pa/4nIDwD8XsBbn1XVb1TO+SyAcQB3mMZR1S0AtgBOj9c4kyQziyuCpv6uAqCnq2Alrj3FQk3d+agdp1G17F1K5QnMmZVDsZC3WpBclxEhnUSku0ZV36qqZwb85wr8FQAuBPAebUVXcNIUBlb0YtLw61QA11702lq3S05qOkXtvva82MIaVhDNz5FSucZ94u9M5cLgKOlEGkqhFJHzAVwF4I9V9VgyUyLtgsmi7u0pxu5qFYegsU2dnxZW5uJvZM5SBIQ4NOSTF5GfA5gD4LnKoZ2q+qGo6/r7+3VwcLDu+5KZwSSWrQg0xp1LlN+fkDTSFJ98GKr6Hxq5nrQ3zbTWmz2XuNU4CckqiWXXxIGWPCGExGfGLXmSLejiICR7UOQJAFZXJCSrsP0fAcBepoRkFYo8AcDqioRkFbprCIDs9TJlfIEQB1ryBEC2qiu68YXhSjtBN76wfWi41VMjZMahyBMA2aquyPgCIdPQXUOmyMoGIsYXCJmGljzJHOzeRMg0FHmSObIUXyCkUeiuIZmjnWruENJqKPIkk2QlvkBIo9BdQwghGYYiTwghGYYiTwghGYYiTwghGYYiTwghGYYiTwghGYYiTwghGSYRkReRvxYRFZGTkxiPEEJIMjQs8iLSB+BtAA40Ph1CCCFJkoQlfxOATwPQBMYihBCSIA2JvIhcDGBYVfdYnLtWRAZFZHBkZKSR2xJCCLEksnaNiPwAwO8FvPVZAJ8BcJ7NjVR1C4AtANDf30+rnxBCZoBIkVfVtwYdF5FlAJYA2CMiALAIwE9F5PWq+utEZ0kIIaQu6q5CqaqPADjFfS0ivwLQr6rPJjAvQgghCcA8eUIIyTCJ1ZNX1dOSGosQQkgy0JInhJAMQ5EnhJAMQ5EnhJAMQ5EnhJAMQ5EnhJAMQ5EnhJAMQ5EnhJAMQ5EnhJAMQ5EnhJAMI6ozXxBSREYAPNWEoU8GkLbaOZxz80nbfIH0zTlt8wXSOeelqvqyOBckVtYgDqq6oBnjisigqvY3Y+xmwTk3n7TNF0jfnNM2XyC9c457Dd01hBCSYSjyhBCSYbIm8ltaPYE64JybT9rmC6RvzmmbL9Ahc25J4JUQQsjMkDVLnhBCiAeKPCGEZJjMiryI/LWIqIic3Oq5RCEim0XkcRHZKyL3iEhPq+cUhIicLyL7ReTnIrKh1fOJQkT6ROSHIvKYiOwTkY+3ek42iEheRIZE5FutnosNItIjIndX/oYfE5E/bPWcohCRdZW/iUdF5E4ROaHVc/IjIreJyDMi8qjn2HwR+b6IPFH5OS9qnEyKvIj0AXgbgAOtnosl3wdwpqouB/DvAK5u8XxqEJE8gC8BeDuA1wC4XERe09pZRTIO4FOq+h8BrALw4RTMGQA+DuCxVk8iBl8E8D1VPQPAWWjzuYtIL4CPAehX1TMB5AG8q7WzCuQrAM73HdsA4H5VPR3A/ZXXoWRS5AHcBODTAFIRVVbV+1R1vPJyJ4BFrZyPgdcD+Lmq/lJVxwB8DcAlLZ5TKKr6tKr+tPLv38ERn97WziocEVkE4AIAt7R6LjaIyMsBvAnArQCgqmOqOtraWVkxC0BRRGYB6AJwuMXzqUFV/y+A532HLwHwL5V//wuAgahxMifyInIxgGFV3dPqudTJBwF8t9WTCKAXwEHP60Noc8H0IiKnAVgB4KHWziSSm+EYKJOtnoglrwIwAuDLFRfTLSIyt9WTCkNVhwH8HZwn/acBHFHV+1o7K2teoapPA44RA+CUqAtSKfIi8oOKL83/3yUAPgvgmlbP0U/EnN1zPgvHxXBH62ZqRAKOpeJJSUROBPB1AJ9Q1d+2ej4mRORCAM+o6q5WzyUGswC8DsA/quoKAEdh4UJoJRU/9iUAlgBYCGCuiLy3tbNqHi2pXdMoqvrWoOMisgzOL26PiACO2+OnIvJ6Vf31DE6xBtOcXUTkCgAXAjhX23PzwiEAfZ7Xi9CGj7h+RKQAR+DvUNVtrZ5PBOcAuFhE/hTACQBeLiK3q2o7C9AhAIdU1X1CuhttLvIA3grgSVUdAQAR2QbgjwDc3tJZ2fEbETlVVZ8WkVMBPBN1QSoteROq+oiqnqKqp6nqaXD+AF/XaoGPQkTOB3AVgItV9Vir52PgYQCni8gSEZkNJ1B1b4vnFIo4K/2tAB5T1RtbPZ8oVPVqVV1U+dt9F4AH2lzgUfl/66CILK0cOhfAz1o4JRsOAFglIl2Vv5Fz0ebBYg/3Arii8u8rAHwj6oJUWvIZ5H8CmAPg+5UnkJ2q+qHWTqkaVR0XkY8A2AEnG+E2Vd3X4mlFcQ6A9wF4RER2V459RlW/08I5ZZGPArijsvj/EsB/bvF8QlHVh0TkbgA/heMeHUIbljgQkTsB/AmAk0XkEIBrAWwCsFVEroSzWP155Djt6RkghBCSBJly1xBCCKmGIk8IIRmGIk8IIRmGIk8IIRmGIk8IIRmGIk8IIRmGIk8IIRnm/wMNgJY3vEOMdAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.figure(0)\n",
    "plt.xlim(-5, 10)\n",
    "plt.ylim(-5, 10)\n",
    "\n",
    "plt.scatter(dist_01[:, 0], dist_01[:, 1])\n",
    "plt.scatter(dist_02[:, 0], dist_02[:, 1])#, color='red')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us now represent it in a tabular way. We will have dist_01 getting label 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1000\n",
      "500\n"
     ]
    }
   ],
   "source": [
    "r = dist_01.shape[0] + dist_02.shape[0]\n",
    "c = dist_01.shape[1] + 1\n",
    "data = np.zeros((r, c))\n",
    "print(data.shape[0])\n",
    "print(dist_01.shape[0])\n",
    "\n",
    "data[:dist_01.shape[0], :2] = dist_01\n",
    "data[dist_01.shape[0]:, :2] = dist_02\n",
    "data[dist_01.shape[0]:, -1] = 1.0\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now shuffle the data and check by printing the first 10 rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 5.01160508  5.33554096  1.        ]\n",
      " [ 3.52894906  0.67503824  0.        ]\n",
      " [-0.67067924  2.47052769  0.        ]\n",
      " [ 4.94536789  4.78631836  1.        ]\n",
      " [ 0.53639697  2.03772055  0.        ]\n",
      " [-0.23305406 -0.38586786  0.        ]\n",
      " [ 4.16792974  4.59503355  1.        ]\n",
      " [ 0.65551105  1.03051941  0.        ]\n",
      " [ 2.84191517  4.47805427  1.        ]\n",
      " [ 3.2877785   3.7208029   1.        ]]\n",
      "[[ 5.01160508  5.33554096]\n",
      " [ 3.52894906  0.67503824]\n",
      " [-0.67067924  2.47052769]\n",
      " ...\n",
      " [ 4.55421775  6.00010738]\n",
      " [ 3.9578194   4.99936198]\n",
      " [ 0.94714702  0.15911902]]\n"
     ]
    }
   ],
   "source": [
    "np.random.shuffle(data)\n",
    "print(data[:10])\n",
    "print(data[:, :2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Implementation.** Next, we implement our KNN algorithm. There are many ways to do this, but a basic approach will require a pairwise distance measure for instances, and a way to take a \"training\" dataset of classified instances and make a prediction for a \"test\" data instance. Here is a top-level outline:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [],
   "source": [
    "def distance(x1, x2,p=2):\n",
    "    return (np.sum(np.abs(x1-x2)**p))**(1/p)\n",
    "    \n",
    "def knn(X_train, y_train, xt, k=7):\n",
    "    #print(\"{0} nn\".format(k))\n",
    "    #TODO\n",
    "    test = np.array([xt,]*X_train.shape[0])\n",
    "    diff = np.subtract(X_train,test)\n",
    "    dist = np.linalg.norm(diff,axis=1)\n",
    "    index = np.argpartition(dist,k)[:k]\n",
    "    k_nn = y_train[index]\n",
    "    #find most frequent item in k nest neighbors\n",
    "    #return_inverse=True： return element from the original array's new index in the new aray\n",
    "    u, indices = np.unique(k_nn, return_inverse=True)\n",
    "    return u[np.argmax(np.bincount(indices))]\n",
    "    #return np.bincount(k_nn).argmax()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now check to see if we can make a prediction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n"
     ]
    }
   ],
   "source": [
    "test_point = np.array([8, -4])\n",
    "# Un-comment the line below and check if it comes out as 0.0  \n",
    "print(knn(data[:, :2][:10], data[:, -1][:10], test_point))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0 1 2] [1 0 0 0 2 1 1]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = np.array([1,0,0,0,2,1,1])\n",
    "u, indices = np.unique(a, return_inverse=True)\n",
    "print(u,indices)\n",
    "u[np.argmax(np.bincount(a))]\n",
    "#print(u,indices)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a train and test split of the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(750, 2) (750,)\n",
      "(250, 2) (250,)\n"
     ]
    }
   ],
   "source": [
    "np.random.shuffle(data)\n",
    "split = int(0.75 * data.shape[0])\n",
    "# print split\n",
    "train_data_X = data[:split, :2]\n",
    "train_data_y = data[:split, -1]\n",
    "test_data_X = data[split:, :2]\n",
    "test_data_y = data[split:, -1]\n",
    "\n",
    "print(train_data_X.shape, train_data_y.shape)\n",
    "print(test_data_X.shape, test_data_y.shape)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Implementation.** Next we need to implement some way to run our KNN classifier on all the test data and get the results. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.996\n"
     ]
    }
   ],
   "source": [
    "def get_acc(kx):\n",
    "    #TODO\n",
    "    error = 0\n",
    "    for idx, j in enumerate(test_data_X):\n",
    "        y_head = knn(train_data_X, train_data_y, j, kx)\n",
    "        if(y_head != test_data_y[idx]):\n",
    "            error += 1\n",
    "        else:\n",
    "            continue\n",
    "    #print(error/test_data_y.shape[0])\n",
    "    return 1 - error/test_data_y.shape[0]\n",
    "\n",
    "\n",
    "print(get_acc(7))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What accuracy did you get ? You should get around 99 percent on this dataset. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's try different values of K."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "k: 2 | Acc: 0.996\n",
      "k: 3 | Acc: 0.996\n",
      "k: 4 | Acc: 0.996\n",
      "k: 5 | Acc: 0.996\n",
      "k: 6 | Acc: 0.996\n",
      "k: 7 | Acc: 0.996\n",
      "k: 8 | Acc: 0.996\n",
      "k: 9 | Acc: 0.996\n",
      "k: 10 | Acc: 0.996\n",
      "k: 11 | Acc: 0.996\n",
      "k: 12 | Acc: 0.996\n",
      "k: 13 | Acc: 0.996\n",
      "k: 14 | Acc: 0.996\n",
      "k: 15 | Acc: 0.996\n",
      "k: 16 | Acc: 0.996\n",
      "k: 17 | Acc: 0.996\n",
      "k: 18 | Acc: 0.996\n",
      "k: 19 | Acc: 0.996\n"
     ]
    }
   ],
   "source": [
    "for ix in range(2, 20):\n",
    "    print (\"k:\", ix, \"| Acc:\", get_acc(ix))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Now let's try real data : MNIST"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import datetime"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Of course, MNIST is image data, but here we are using a CSV version where we can view the pixels as numbers (each row has the pixel data for an image of a digit, and the first column is the class of the digit, i.e., 0-9)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2499, 785)"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('train.csv')\n",
    "df.head()\n",
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since the dataset is quite big, we will just use a subset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(2000, 785)\n"
     ]
    }
   ],
   "source": [
    "data = df.values[:2000]\n",
    "print (data.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make a train/test split of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(1600, 784) (1600,)\n",
      "(400, 784) (400,)\n"
     ]
    }
   ],
   "source": [
    "split = int(0.8 * data.shape[0])\n",
    "\n",
    "X_train = data[:split, 1:]\n",
    "X_test = data[split:, 1:]\n",
    "\n",
    "y_train = data[:split, 0]\n",
    "y_test = data[split:, 0]\n",
    "\n",
    "print (X_train.shape, y_train.shape)\n",
    "print (X_test.shape, y_test.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us just check that our data really does represent images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAP8AAAD8CAYAAAC4nHJkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAADG1JREFUeJzt3X/oXfV9x/Hn28z8oQ1EKdpgs6WWODY1S0eQQXQ4q8WNQoyxUv8YGStNwQor7A/FfyqMgoy1W/8KpCQmQpO2YJyhlrU1jBlxiFFikzazFcnaLDHfiNVYQYrJe398T8q38XvP/eb+Ojd5Px8Q7r3nfe4975zk9f2ce8+5309kJpLquaTrBiR1w/BLRRl+qSjDLxVl+KWiDL9UlOGXijL8UlGGXyrqDya5sYjwckJpzDIzFrLeUCN/RNwZEa9GxGsR8dAwryVpsmLQa/sjYhHwc+AO4CjwInBfZv6s5TmO/NKYTWLkvwl4LTNfz8zfAt8B1g3xepImaJjwXwP8as7jo82y3xMRmyJif0TsH2JbkkZsmA/85ju0+NBhfWZuAbaAh/3SNBlm5D8KLJ/z+OPAseHakTQpw4T/RWBlRHwiIhYDnwf2jKYtSeM28GF/Zn4QEQ8APwQWAdsy86cj60zSWA18qm+gjfmeXxq7iVzkI+nCZfilogy/VJThl4oy/FJRhl8qyvBLRRl+qSjDLxVl+KWiDL9UlOGXijL8UlGGXyrK8EtFGX6pKMMvFWX4paIMv1SU4ZeKMvxSUYZfKsrwS0UZfqkowy8VZfilogy/VJThl4oy/FJRA0/RDRARR4B3gdPAB5m5ZhRNSQD79u1rrW/fvr21vnXr1hF2c/EZKvyNv8rMN0fwOpImyMN+qahhw5/AjyLipYjYNIqGJE3GsIf9azPzWERcBfw4Iv4nM5+du0LzQ8EfDNKUGWrkz8xjze0M8CRw0zzrbMnMNX4YKE2XgcMfEZdHxJKz94HPAIdG1Zik8RrmsP9q4MmIOPs6OzPzP0bSlaSxGzj8mfk68Gcj7KWsSy+9tLV+/fXXt9YPHDgwynYm5tprr22tr169urV+5syZUbZTjqf6pKIMv1SU4ZeKMvxSUYZfKsrwS0WN4lt9GtI999zTWl+1alVr/UI91bd48eLW+mWXXTahTmpy5JeKMvxSUYZfKsrwS0UZfqkowy8VZfilojzPPwXuvvvu1vrJkycn1IkqceSXijL8UlGGXyrK8EtFGX6pKMMvFWX4paI8zz8FNmzY0Frvd57//vvvH2U7E7Ny5cquWyjNkV8qyvBLRRl+qSjDLxVl+KWiDL9UlOGXiup7nj8itgGfBWYy84Zm2ZXAd4EVwBHg3sz89fjavLhFRGv9sccem1An06XffrnkEseuYSxk720H7jxn2UPA3sxcCextHku6gPQNf2Y+C7x1zuJ1wI7m/g7grhH3JWnMBj1uujozjwM0t1eNriVJkzD2a/sjYhOwadzbkXR+Bh35T0TEMoDmdqbXipm5JTPXZOaaAbclaQwGDf8eYGNzfyPw1GjakTQpfcMfEbuA/wb+OCKORsQXgEeBOyLiF8AdzWNJF5C+7/kz874epU+PuJeyMnOo+oVq1apVrfV+f+8zZ86Msp1yvEpCKsrwS0UZfqkowy8VZfilogy/VJS/uludWbJkSdctlObILxVl+KWiDL9UlOGXijL8UlGGXyrK8EtFeZ5/ApYuXdp1Cxekt99+u7W+a9euCXVycXLkl4oy/FJRhl8qyvBLRRl+qSjDLxVl+KWiPM8/AevXr++6hZ6uu+661vott9zSWh/m12dv2LChtb579+7W+vvvvz/wtuXIL5Vl+KWiDL9UlOGXijL8UlGGXyrK8EtFRb9pkCNiG/BZYCYzb2iWPQJ8ETjZrPZwZv6g78YiLs65pvvYt29fa/3mm29urb/yyiut9ZmZmZ6122+/vfW5/UREa73L6cMvucSxaz6Z2f6P1ljI3tsO3DnP8n/NzNXNn77BlzRd+oY/M58F3ppAL5ImaJjjpgci4icRsS0irhhZR5ImYtDwbwY+CawGjgNf77ViRGyKiP0RsX/AbUkag4HCn5knMvN0Zp4BvgXc1LLulsxck5lrBm1S0ugNFP6IWDbn4Xrg0GjakTQpfb/SGxG7gFuBj0bEUeCrwK0RsRpI4AjwpTH2KGkM+oY/M++bZ/HWMfRy0dq5c2drfe3ata31G2+8sbV+6tSpnrWnn3669bmHDrUftG3fvr21Poznn3++tb53796xbVte4SeVZfilogy/VJThl4oy/FJRhl8qyl/dPQGbN29ura9YsaK1fvDgwdb6M88807P2xhtvtD63S6dPn26tv/POOxPqpCZHfqkowy8VZfilogy/VJThl4oy/FJRhl8qyvP8U+DBBx/suoWxWbJkSc/aokWLJtiJzuXILxVl+KWiDL9UlOGXijL8UlGGXyrK8EtFeZ5fY3Xbbbf1rC1dunSCnehcjvxSUYZfKsrwS0UZfqkowy8VZfilogy/VFTf8/wRsRx4HPgYcAbYkpnfjIgrge8CK4AjwL2Z+evxtaoL0YEDB3rW3nvvvQl2onMtZOT/APjHzPwT4C+AL0fEnwIPAXszcyWwt3ks6QLRN/yZeTwzX27uvwscBq4B1gE7mtV2AHeNq0lJo3de7/kjYgXwKeAF4OrMPA6zPyCAq0bdnKTxWfC1/RHxEeAJ4CuZeSoiFvq8TcCmwdqTNC4LGvkj4lJmg//tzNzdLD4REcua+jJgZr7nZuaWzFyTmWtG0bCk0egb/pgd4rcChzPzG3NKe4CNzf2NwFOjb0/SuCzksH8t8LfAwYg4e97mYeBR4HsR8QXgl8DnxtOiLmTLly/vWVu8ePEEO9G5+oY/M58Der3B//Ro25E0KV7hJxVl+KWiDL9UlOGXijL8UlGGXyrKX92tsXruued61k6dOjXBTnQuR36pKMMvFWX4paIMv1SU4ZeKMvxSUYZfKsrwS0UZfqkowy8VZfilogy/VJThl4oy/FJRhl8qyu/zqzOvvvpq1y2U5sgvFWX4paIMv1SU4ZeKMvxSUYZfKsrwS0VFZravELEceBz4GHAG2JKZ34yIR4AvAiebVR/OzB/0ea32jUkaWmbGQtZbSPiXAcsy8+WIWAK8BNwF3Av8JjP/ZaFNGX5p/BYa/r5X+GXmceB4c//diDgMXDNce5K6dl7v+SNiBfAp4IVm0QMR8ZOI2BYRV/R4zqaI2B8R+4fqVNJI9T3s/92KER8B/gv4WmbujoirgTeBBP6J2bcGf9/nNTzsl8ZsZO/5ASLiUuD7wA8z8xvz1FcA38/MG/q8juGXxmyh4e972B8RAWwFDs8NfvNB4FnrgUPn26Sk7izk0/6bgX3AQWZP9QE8DNwHrGb2sP8I8KXmw8G213Lkl8ZspIf9o2L4pfEb2WG/pIuT4ZeKMvxSUYZfKsrwS0UZfqkowy8VZfilogy/VJThl4oy/FJRhl8qyvBLRRl+qahJT9H9JvC/cx5/tFk2jaa1t2ntC+xtUKPs7Y8WuuJEv8//oY1H7M/MNZ010GJae5vWvsDeBtVVbx72S0UZfqmorsO/pePtt5nW3qa1L7C3QXXSW6fv+SV1p+uRX1JHOgl/RNwZEa9GxGsR8VAXPfQSEUci4mBEHOh6irFmGrSZiDg0Z9mVEfHjiPhFczvvNGkd9fZIRPxfs+8ORMTfdNTb8oj4z4g4HBE/jYh/aJZ3uu9a+upkv038sD8iFgE/B+4AjgIvAvdl5s8m2kgPEXEEWJOZnZ8Tjoi/BH4DPH52NqSI+Gfgrcx8tPnBeUVmPjglvT3Cec7cPKbees0s/Xd0uO9GOeP1KHQx8t8EvJaZr2fmb4HvAOs66GPqZeazwFvnLF4H7Gju72D2P8/E9ehtKmTm8cx8ubn/LnB2ZulO911LX53oIvzXAL+a8/go0zXldwI/ioiXImJT183M4+qzMyM1t1d13M+5+s7cPEnnzCw9NftukBmvR62L8M83m8g0nXJYm5l/Dvw18OXm8FYLsxn4JLPTuB0Hvt5lM83M0k8AX8nMU132Mtc8fXWy37oI/1Fg+ZzHHweOddDHvDLzWHM7AzzJ7NuUaXLi7CSpze1Mx/38TmaeyMzTmXkG+BYd7rtmZukngG9n5u5mcef7br6+utpvXYT/RWBlRHwiIhYDnwf2dNDHh0TE5c0HMUTE5cBnmL7Zh/cAG5v7G4GnOuzl90zLzM29Zpam4303bTNed3KRT3Mq49+ARcC2zPzaxJuYR0Rcy+xoD7PfeNzZZW8RsQu4ldlvfZ0Avgr8O/A94A+BXwKfy8yJf/DWo7dbOc+Zm8fUW6+ZpV+gw303yhmvR9KPV/hJNXmFn1SU4ZeKMvxSUYZfKsrwS0UZfqkowy8VZfilov4fpGqYXYdIZwUAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.figure(0)\n",
    "plt.imshow(X_train[91].reshape((28, 28)), cmap='gray', interpolation='none')\n",
    "print (y_train[91])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Implementation.** Now code another ```get_acc()``` and try different values of K on our dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "k: 7 | Acc: 0.89\n",
      "k: 9 | Acc: 0.88\n",
      "k: 11 | Acc: 0.8725\n"
     ]
    }
   ],
   "source": [
    "def get_acc(kx):\n",
    "    error = 0\n",
    "    for idx, j in enumerate(X_test):\n",
    "        \n",
    "        y_head = knn(X_train, y_train, j, kx)\n",
    "        #print(y_head)\n",
    "        if(y_head != y_test[idx]):\n",
    "            error += 1\n",
    "        else:\n",
    "            continue\n",
    "    return 1 - error/y_test.shape[0]\n",
    "\n",
    "for ix in range(7, 13,2):\n",
    "    print (\"k:\", ix, \"| Acc:\", get_acc(ix))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
